class: center, middle, inverse, title-slide .title[ # Inferentials with
Infer
] .author[ ### Week 12 ] --- <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.6.0/jquery.min.js"></script> <script type="text/x-mathjax-config"> MathJax.Hub.Register.StartupHook("TeX Jax Ready",function () { MathJax.Hub.Insert(MathJax.InputJax.TeX.Definitions.macros,{ cancel: ["Extension","cancel"], bcancel: ["Extension","cancel"], xcancel: ["Extension","cancel"], cancelto: ["Extension","cancel"] }); }); </script>
# Packages needed and a Note about Icons Please load up the following packages. Remember to first install the ones you don't have. ```r library(tidyverse) library(infer) library(patchwork) ``` You may come across the following icons. The table below lists what each means. <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:center;"> Icon </th> <th style="text-align:left;"> Description </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;width: 10em; "> <svg aria-hidden="true" role="img" viewbox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill: #4682b4;overflow:visible;position:relative;"><path d="M52.51 440.6l171.5-142.9V214.3L52.51 71.41C31.88 54.28 0 68.66 0 96.03v319.9C0 443.3 31.88 457.7 52.51 440.6zM308.5 440.6l192-159.1c15.25-12.87 15.25-36.37 0-49.24l-192-159.1c-20.63-17.12-52.51-2.749-52.51 24.62v319.9C256 443.3 287.9 457.7 308.5 440.6z"></path></svg> </td> <td style="text-align:left;width: 40em; "> Indicates that an example continues on the following slide. </td> </tr> <tr> <td style="text-align:center;width: 10em; "> <svg aria-hidden="true" role="img" viewbox="0 0 384 512" style="height:1em;width:0.75em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#ff6347;overflow:visible;position:relative;"><path d="M384 128v255.1c0 35.35-28.65 64-64 64H64c-35.35 0-64-28.65-64-64V128c0-35.35 28.65-64 64-64H320C355.3 64 384 92.65 384 128z"></path></svg> </td> <td style="text-align:left;width: 40em; "> Indicates that a section using common syntax has ended. </td> </tr> <tr> <td style="text-align:center;width: 10em; "> <svg aria-hidden="true" role="img" viewbox="0 0 640 512" style="height:1em;width:1.25em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#5cb85c;overflow:visible;position:relative;"><path d="M172.5 131.1C228.1 75.51 320.5 75.51 376.1 131.1C426.1 181.1 433.5 260.8 392.4 318.3L391.3 319.9C381 334.2 361 337.6 346.7 327.3C332.3 317 328.9 297 339.2 282.7L340.3 281.1C363.2 249 359.6 205.1 331.7 177.2C300.3 145.8 249.2 145.8 217.7 177.2L105.5 289.5C73.99 320.1 73.99 372 105.5 403.5C133.3 431.4 177.3 435 209.3 412.1L210.9 410.1C225.3 400.7 245.3 404 255.5 418.4C265.8 432.8 262.5 452.8 248.1 463.1L246.5 464.2C188.1 505.3 110.2 498.7 60.21 448.8C3.741 392.3 3.741 300.7 60.21 244.3L172.5 131.1zM467.5 380C411 436.5 319.5 436.5 263 380C213 330 206.5 251.2 247.6 193.7L248.7 192.1C258.1 177.8 278.1 174.4 293.3 184.7C307.7 194.1 311.1 214.1 300.8 229.3L299.7 230.9C276.8 262.1 280.4 306.9 308.3 334.8C339.7 366.2 390.8 366.2 422.3 334.8L534.5 222.5C566 191 566 139.1 534.5 108.5C506.7 80.63 462.7 76.99 430.7 99.9L429.1 101C414.7 111.3 394.7 107.1 384.5 93.58C374.2 79.2 377.5 59.21 391.9 48.94L393.5 47.82C451 6.731 529.8 13.25 579.8 63.24C636.3 119.7 636.3 211.3 579.8 267.7L467.5 380z"></path></svg> </td> <td style="text-align:left;width: 40em; "> Indicates that there is an active hyperlink on the slide. </td> </tr> <tr> <td style="text-align:center;width: 10em; "> <svg aria-hidden="true" role="img" viewbox="0 0 384 512" style="height:1em;width:0.75em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#faffbd;overflow:visible;position:relative;"><path d="M384 48V512l-192-112L0 512V48C0 21.5 21.5 0 48 0h288C362.5 0 384 21.5 384 48z"></path></svg> </td> <td style="text-align:left;width: 40em; "> Indicates that a section covering a concept has ended. </td> </tr> </tbody> </table> --- # Load up General Social Survey ``` ## # A tibble: 6 × 11 ## year age sex college partyid hompop hours income class finrela weight ## <dbl> <dbl> <fct> <fct> <fct> <dbl> <dbl> <ord> <fct> <fct> <dbl> ## 1 2014 36 male degree ind 3 50 $25000… midd… below … 0.896 ## 2 1994 34 female no degree rep 4 31 $20000… work… below … 1.08 ## 3 1998 24 male degree ind 1 40 $25000… work… below … 0.550 ## 4 1996 42 male no degree ind 4 40 $25000… work… above … 1.09 ## 5 1994 31 male degree rep 2 40 $25000… midd… above … 1.08 ## 6 1996 32 female no degree rep 4 53 $25000… midd… average 1.09 ``` -- Check the unique values for each ``` ## [1] degree no degree ## Levels: no degree degree ``` ``` ## [1] $25000 or more $20000 - 24999 $15000 - 19999 $8000 to 9999 $10000 - 14999 ## [6] $5000 to 5999 $6000 to 6999 $4000 to 4999 $1000 to 2999 $7000 to 7999 ## [11] $3000 to 3999 lt $1000 ## 12 Levels: lt $1000 < $1000 to 2999 < $3000 to 3999 < ... < $25000 or more ``` ``` ## [1] below average above average average far below average ## [5] DK far above average ## 6 Levels: far below average below average average ... DK ``` --- Let's test the association between income (`income`) and educational attainment (`college`) and family income (`finrela`) count: false .panel1-sw1-auto[ ```r *gss ``` ] .panel2-sw1-auto[ ``` ## # A tibble: 500 × 11 ## year age sex college partyid hompop hours income class finrela weight ## <dbl> <dbl> <fct> <fct> <fct> <dbl> <dbl> <ord> <fct> <fct> <dbl> ## 1 2014 36 male degree ind 3 50 $2500… midd… below … 0.896 ## 2 1994 34 female no degree rep 4 31 $2000… work… below … 1.08 ## 3 1998 24 male degree ind 1 40 $2500… work… below … 0.550 ## 4 1996 42 male no degree ind 4 40 $2500… work… above … 1.09 ## 5 1994 31 male degree rep 2 40 $2500… midd… above … 1.08 ## 6 1996 32 female no degree rep 4 53 $2500… midd… average 1.09 ## 7 1990 48 female no degree dem 2 32 $2500… work… below … 1.06 ## 8 2016 36 female degree ind 1 20 $2500… midd… above … 0.478 ## 9 2000 30 female degree rep 5 40 $2500… midd… average 1.10 ## 10 1998 33 female no degree dem 2 40 $1500… work… far be… 0.550 ## # … with 490 more rows ``` ] --- count: false .panel1-sw1-auto[ ```r gss %>% * filter(finrela != "DK") ``` ] .panel2-sw1-auto[ ``` ## # A tibble: 495 × 11 ## year age sex college partyid hompop hours income class finrela weight ## <dbl> <dbl> <fct> <fct> <fct> <dbl> <dbl> <ord> <fct> <fct> <dbl> ## 1 2014 36 male degree ind 3 50 $2500… midd… below … 0.896 ## 2 1994 34 female no degree rep 4 31 $2000… work… below … 1.08 ## 3 1998 24 male degree ind 1 40 $2500… work… below … 0.550 ## 4 1996 42 male no degree ind 4 40 $2500… work… above … 1.09 ## 5 1994 31 male degree rep 2 40 $2500… midd… above … 1.08 ## 6 1996 32 female no degree rep 4 53 $2500… midd… average 1.09 ## 7 1990 48 female no degree dem 2 32 $2500… work… below … 1.06 ## 8 2016 36 female degree ind 1 20 $2500… midd… above … 0.478 ## 9 2000 30 female degree rep 5 40 $2500… midd… average 1.10 ## 10 1998 33 female no degree dem 2 40 $1500… work… far be… 0.550 ## # … with 485 more rows ``` ] --- count: false .panel1-sw1-auto[ ```r gss %>% filter(finrela != "DK") %>% * ggplot() ``` ] .panel2-sw1-auto[ ![](Slides-Week-12R_files/figure-html/sw1_auto_03_output-1.png)<!-- --> ] --- count: false .panel1-sw1-auto[ ```r gss %>% filter(finrela != "DK") %>% ggplot() + * aes(x = finrela, fill = college) ``` ] .panel2-sw1-auto[ ![](Slides-Week-12R_files/figure-html/sw1_auto_04_output-1.png)<!-- --> ] --- count: false .panel1-sw1-auto[ ```r gss %>% filter(finrela != "DK") %>% ggplot() + aes(x = finrela, fill = college) + * geom_bar(position = "fill") ``` ] .panel2-sw1-auto[ ![](Slides-Week-12R_files/figure-html/sw1_auto_05_output-1.png)<!-- --> ] --- count: false .panel1-sw1-auto[ ```r gss %>% filter(finrela != "DK") %>% ggplot() + aes(x = finrela, fill = college) + geom_bar(position = "fill") + * scale_fill_brewer(palette = "Pastel1") ``` ] .panel2-sw1-auto[ ![](Slides-Week-12R_files/figure-html/sw1_auto_06_output-1.png)<!-- --> ] --- count: false .panel1-sw1-auto[ ```r gss %>% filter(finrela != "DK") %>% ggplot() + aes(x = finrela, fill = college) + geom_bar(position = "fill") + scale_fill_brewer(palette = "Pastel1") + * theme(axis.text.x = element_text(size = 12, * angle = 45, * vjust = 0.5)) ``` ] .panel2-sw1-auto[ ![](Slides-Week-12R_files/figure-html/sw1_auto_07_output-1.png)<!-- --> ] --- count: false .panel1-sw1-auto[ ```r gss %>% filter(finrela != "DK") %>% ggplot() + aes(x = finrela, fill = college) + geom_bar(position = "fill") + scale_fill_brewer(palette = "Pastel1") + theme(axis.text.x = element_text(size = 12, angle = 45, vjust = 0.5)) + * theme_minimal() ``` ] .panel2-sw1-auto[ ![](Slides-Week-12R_files/figure-html/sw1_auto_08_output-1.png)<!-- --> ] <style> .panel1-sw1-auto { color: white; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-sw1-auto { color: white; width: 49%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-sw1-auto { color: white; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> .right[
] --- Now calculate the observed statistic count: false .panel1-sw2-auto[ ```r *gss ``` ] .panel2-sw2-auto[ ``` ## # A tibble: 500 × 11 ## year age sex college partyid hompop hours income class finrela weight ## <dbl> <dbl> <fct> <fct> <fct> <dbl> <dbl> <ord> <fct> <fct> <dbl> ## 1 2014 36 male degree ind 3 50 $2500… midd… below … 0.896 ## 2 1994 34 female no degree rep 4 31 $2000… work… below … 1.08 ## 3 1998 24 male degree ind 1 40 $2500… work… below … 0.550 ## 4 1996 42 male no degree ind 4 40 $2500… work… above … 1.09 ## 5 1994 31 male degree rep 2 40 $2500… midd… above … 1.08 ## 6 1996 32 female no degree rep 4 53 $2500… midd… average 1.09 ## 7 1990 48 female no degree dem 2 32 $2500… work… below … 1.06 ## 8 2016 36 female degree ind 1 20 $2500… midd… above … 0.478 ## 9 2000 30 female degree rep 5 40 $2500… midd… average 1.10 ## 10 1998 33 female no degree dem 2 40 $1500… work… far be… 0.550 ## # … with 490 more rows ``` ] --- count: false .panel1-sw2-auto[ ```r gss %>% * specify(college ~ finrela) # compare variables ``` ] .panel2-sw2-auto[ ``` ## Response: college (factor) ## Explanatory: finrela (factor) ## # A tibble: 500 × 2 ## college finrela ## <fct> <fct> ## 1 degree below average ## 2 no degree below average ## 3 degree below average ## 4 no degree above average ## 5 degree above average ## 6 no degree average ## 7 no degree below average ## 8 degree above average ## 9 degree average ## 10 no degree far below average ## # … with 490 more rows ``` ] --- count: false .panel1-sw2-auto[ ```r gss %>% specify(college ~ finrela) %>% # compare variables * hypothesise(null = "independence") # declare null hypothesis ``` ] .panel2-sw2-auto[ ``` ## Response: college (factor) ## Explanatory: finrela (factor) ## Null Hypothesis: independence ## # A tibble: 500 × 2 ## college finrela ## <fct> <fct> ## 1 degree below average ## 2 no degree below average ## 3 degree below average ## 4 no degree above average ## 5 degree above average ## 6 no degree average ## 7 no degree below average ## 8 degree above average ## 9 degree average ## 10 no degree far below average ## # … with 490 more rows ``` ] --- count: false .panel1-sw2-auto[ ```r gss %>% specify(college ~ finrela) %>% # compare variables hypothesise(null = "independence") %>% # declare null hypothesis * calculate(stat = "Chisq") # tell infer what test to run ``` ] .panel2-sw2-auto[ ``` ## Response: college (factor) ## Explanatory: finrela (factor) ## Null Hypothesis: independence ## # A tibble: 1 × 1 ## stat ## <dbl> ## 1 30.7 ``` ] <style> .panel1-sw2-auto { color: white; width: 45.4146341463415%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-sw2-auto { color: white; width: 52.5853658536585%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-sw2-auto { color: white; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- --- The observed `\(\chi^2\)` statistic is 30.6825231. Now, we want to compare this statistic to a null distribution, generated under the assumption that these variables are not actually related, to get a sense of how likely it would be for us to see this observed statistic if there were actually no association between education and income. --- # Null Distribution Simulation Let's take a look at the ratings count: false .panel1-sw3-auto[ ```r *gss ``` ] .panel2-sw3-auto[ ``` ## # A tibble: 500 × 11 ## year age sex college partyid hompop hours income class finrela weight ## <dbl> <dbl> <fct> <fct> <fct> <dbl> <dbl> <ord> <fct> <fct> <dbl> ## 1 2014 36 male degree ind 3 50 $2500… midd… below … 0.896 ## 2 1994 34 female no degree rep 4 31 $2000… work… below … 1.08 ## 3 1998 24 male degree ind 1 40 $2500… work… below … 0.550 ## 4 1996 42 male no degree ind 4 40 $2500… work… above … 1.09 ## 5 1994 31 male degree rep 2 40 $2500… midd… above … 1.08 ## 6 1996 32 female no degree rep 4 53 $2500… midd… average 1.09 ## 7 1990 48 female no degree dem 2 32 $2500… work… below … 1.06 ## 8 2016 36 female degree ind 1 20 $2500… midd… above … 0.478 ## 9 2000 30 female degree rep 5 40 $2500… midd… average 1.10 ## 10 1998 33 female no degree dem 2 40 $1500… work… far be… 0.550 ## # … with 490 more rows ``` ] --- count: false .panel1-sw3-auto[ ```r gss %>% * specify(college ~ finrela) ``` ] .panel2-sw3-auto[ ``` ## Response: college (factor) ## Explanatory: finrela (factor) ## # A tibble: 500 × 2 ## college finrela ## <fct> <fct> ## 1 degree below average ## 2 no degree below average ## 3 degree below average ## 4 no degree above average ## 5 degree above average ## 6 no degree average ## 7 no degree below average ## 8 degree above average ## 9 degree average ## 10 no degree far below average ## # … with 490 more rows ``` ] --- count: false .panel1-sw3-auto[ ```r gss %>% specify(college ~ finrela) %>% * assume(distribution = "Chisq") ``` ] .panel2-sw3-auto[ ``` ## A Chi-squared distribution with 5 degrees of freedom. ``` ] --- count: false .panel1-sw3-auto[ ```r gss %>% specify(college ~ finrela) %>% assume(distribution = "Chisq") ``` ] .panel2-sw3-auto[ ``` ## A Chi-squared distribution with 5 degrees of freedom. ``` ] <style> .panel1-sw3-auto { color: white; width: 44.3333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-sw3-auto { color: white; width: 53.6666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-sw3-auto { color: white; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- --- Either way, it looks like our observed test statistic would be quite unlikely if there were actually no association between education and income. More exactly, we can approximate the *p*-value with `get_p_value`: --- count: false .panel1-sw4-auto[ ```r *gss ``` ] .panel2-sw4-auto[ ``` ## # A tibble: 500 × 11 ## year age sex college partyid hompop hours income class finrela weight ## <dbl> <dbl> <fct> <fct> <fct> <dbl> <dbl> <ord> <fct> <fct> <dbl> ## 1 2014 36 male degree ind 3 50 $2500… midd… below … 0.896 ## 2 1994 34 female no degree rep 4 31 $2000… work… below … 1.08 ## 3 1998 24 male degree ind 1 40 $2500… work… below … 0.550 ## 4 1996 42 male no degree ind 4 40 $2500… work… above … 1.09 ## 5 1994 31 male degree rep 2 40 $2500… midd… above … 1.08 ## 6 1996 32 female no degree rep 4 53 $2500… midd… average 1.09 ## 7 1990 48 female no degree dem 2 32 $2500… work… below … 1.06 ## 8 2016 36 female degree ind 1 20 $2500… midd… above … 0.478 ## 9 2000 30 female degree rep 5 40 $2500… midd… average 1.10 ## 10 1998 33 female no degree dem 2 40 $1500… work… far be… 0.550 ## # … with 490 more rows ``` ] --- count: false .panel1-sw4-auto[ ```r gss %>% * specify(college ~ finrela) ``` ] .panel2-sw4-auto[ ``` ## Response: college (factor) ## Explanatory: finrela (factor) ## # A tibble: 500 × 2 ## college finrela ## <fct> <fct> ## 1 degree below average ## 2 no degree below average ## 3 degree below average ## 4 no degree above average ## 5 degree above average ## 6 no degree average ## 7 no degree below average ## 8 degree above average ## 9 degree average ## 10 no degree far below average ## # … with 490 more rows ``` ] --- count: false .panel1-sw4-auto[ ```r gss %>% specify(college ~ finrela) %>% * hypothesise(null = "independence") ``` ] .panel2-sw4-auto[ ``` ## Response: college (factor) ## Explanatory: finrela (factor) ## Null Hypothesis: independence ## # A tibble: 500 × 2 ## college finrela ## <fct> <fct> ## 1 degree below average ## 2 no degree below average ## 3 degree below average ## 4 no degree above average ## 5 degree above average ## 6 no degree average ## 7 no degree below average ## 8 degree above average ## 9 degree average ## 10 no degree far below average ## # … with 490 more rows ``` ] --- count: false .panel1-sw4-auto[ ```r gss %>% specify(college ~ finrela) %>% hypothesise(null = "independence") %>% * generate(reps = 1000, type = "permute") ``` ] .panel2-sw4-auto[ ``` ## Response: college (factor) ## Explanatory: finrela (factor) ## Null Hypothesis: independence ## # A tibble: 500,000 × 3 ## # Groups: replicate [1,000] ## college finrela replicate ## <fct> <fct> <int> ## 1 no degree below average 1 ## 2 no degree below average 1 ## 3 no degree below average 1 ## 4 degree above average 1 ## 5 no degree above average 1 ## 6 degree average 1 ## 7 degree below average 1 ## 8 no degree above average 1 ## 9 degree average 1 ## 10 degree far below average 1 ## # … with 499,990 more rows ``` ] --- count: false .panel1-sw4-auto[ ```r gss %>% specify(college ~ finrela) %>% hypothesise(null = "independence") %>% generate(reps = 1000, type = "permute") %>% * assume(distribution = "Chisq") ``` ] .panel2-sw4-auto[ ``` ## A Chi-squared distribution with 5 degrees of freedom. ``` ] <style> .panel1-sw4-auto { color: white; width: 44.3333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-sw4-auto { color: white; width: 53.6666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-sw4-auto { color: white; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- --- count: false .panel1-sw5-auto[ ```r *null_distribution ``` ] .panel2-sw5-auto[ ``` ## A Chi-squared distribution with 5 degrees of freedom. ``` ] --- count: false .panel1-sw5-auto[ ```r null_distribution %>% * visualize() ``` ] .panel2-sw5-auto[ ![](Slides-Week-12R_files/figure-html/sw5_auto_02_output-1.png)<!-- --> ] --- count: false .panel1-sw5-auto[ ```r null_distribution %>% visualize() + * shade_p_value(observed_indep_statistic, * direction = "greater") ``` ] .panel2-sw5-auto[ ![](Slides-Week-12R_files/figure-html/sw5_auto_03_output-1.png)<!-- --> ] <style> .panel1-sw5-auto { color: white; width: 44.3333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-sw5-auto { color: white; width: 53.6666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-sw5-auto { color: white; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # `\(\chi^2\)` statistic ``` ## Warning: The chisq_stat() wrapper has been deprecated in favor of the more ## general observe(). Please use that function instead. ``` ``` ## X-squared ## 30.68252 ``` --- # `\(\chi^2\)` Goodness of Fit count: false .panel1-sw6-auto[ ```r *gss ``` ] .panel2-sw6-auto[ ``` ## # A tibble: 500 × 11 ## year age sex college partyid hompop hours income class finrela weight ## <dbl> <dbl> <fct> <fct> <fct> <dbl> <dbl> <ord> <fct> <fct> <dbl> ## 1 2014 36 male degree ind 3 50 $2500… midd… below … 0.896 ## 2 1994 34 female no degree rep 4 31 $2000… work… below … 1.08 ## 3 1998 24 male degree ind 1 40 $2500… work… below … 0.550 ## 4 1996 42 male no degree ind 4 40 $2500… work… above … 1.09 ## 5 1994 31 male degree rep 2 40 $2500… midd… above … 1.08 ## 6 1996 32 female no degree rep 4 53 $2500… midd… average 1.09 ## 7 1990 48 female no degree dem 2 32 $2500… work… below … 1.06 ## 8 2016 36 female degree ind 1 20 $2500… midd… above … 0.478 ## 9 2000 30 female degree rep 5 40 $2500… midd… average 1.10 ## 10 1998 33 female no degree dem 2 40 $1500… work… far be… 0.550 ## # … with 490 more rows ``` ] --- count: false .panel1-sw6-auto[ ```r gss %>% * specify(response = finrela) ``` ] .panel2-sw6-auto[ ``` ## Response: finrela (factor) ## # A tibble: 500 × 1 ## finrela ## <fct> ## 1 below average ## 2 below average ## 3 below average ## 4 above average ## 5 above average ## 6 average ## 7 below average ## 8 above average ## 9 average ## 10 far below average ## # … with 490 more rows ``` ] --- count: false .panel1-sw6-auto[ ```r gss %>% specify(response = finrela) %>% * hypothesise(null = "point", * p = c("far below average" = 1/6, * "below average" = 1/6, * "average" = 1/6, * "above average" = 1/6, * "far above average" = 1/6, * "DK" = 1/6)) ``` ] .panel2-sw6-auto[ ``` ## Response: finrela (factor) ## Null Hypothesis: point ## # A tibble: 500 × 1 ## finrela ## <fct> ## 1 below average ## 2 below average ## 3 below average ## 4 above average ## 5 above average ## 6 average ## 7 below average ## 8 above average ## 9 average ## 10 far below average ## # … with 490 more rows ``` ] --- count: false .panel1-sw6-auto[ ```r gss %>% specify(response = finrela) %>% hypothesise(null = "point", p = c("far below average" = 1/6, "below average" = 1/6, "average" = 1/6, "above average" = 1/6, "far above average" = 1/6, "DK" = 1/6)) %>% * calculate(stat = "Chisq") ``` ] .panel2-sw6-auto[ ``` ## Response: finrela (factor) ## Null Hypothesis: point ## # A tibble: 1 × 1 ## stat ## <dbl> ## 1 488. ``` ] <style> .panel1-sw6-auto { color: white; width: 44.3333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-sw6-auto { color: white; width: 53.6666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-sw6-auto { color: white; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- ```r observed_gof_statistic <- gss %>% specify(response = finrela) %>% hypothesise(null = "point", p = c("far below average" = 1/6, "below average" = 1/6, "average" = 1/6, "above average" = 1/6, "far above average" = 1/6, "DK" = 1/6)) %>% calculate(stat = "Chisq") ``` --- ```r null_dist_gof <- gss %>% specify(response = finrela) %>% hypothesise(null = "point", p = c("far below average" = 1/6, "below average" = 1/6, "average" = 1/6, "above average" = 1/6, "far above average" = 1/6, "DK" = 1/6)) %>% generate(reps = 1000, type = "draw") %>% # we only added this! calculate(stat = "Chisq") ``` --- count: false .panel1-sw7-auto[ ```r *null_dist_gof ``` ] .panel2-sw7-auto[ ``` ## Response: finrela (factor) ## Null Hypothesis: point ## # A tibble: 1,000 × 2 ## replicate stat ## <fct> <dbl> ## 1 1 5.46 ## 2 2 2.92 ## 3 3 8.01 ## 4 4 10.9 ## 5 5 0.784 ## 6 6 3.59 ## 7 7 6.28 ## 8 8 6.71 ## 9 9 4.58 ## 10 10 8.8 ## # … with 990 more rows ``` ] --- count: false .panel1-sw7-auto[ ```r null_dist_gof %>% * visualize() ``` ] .panel2-sw7-auto[ ![](Slides-Week-12R_files/figure-html/sw7_auto_02_output-1.png)<!-- --> ] --- count: false .panel1-sw7-auto[ ```r null_dist_gof %>% visualize() + * shade_p_value(observed_gof_statistic, * direction = "greater") ``` ] .panel2-sw7-auto[ ![](Slides-Week-12R_files/figure-html/sw7_auto_03_output-1.png)<!-- --> ] <style> .panel1-sw7-auto { color: white; width: 44.3333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-sw7-auto { color: white; width: 53.6666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-sw7-auto { color: white; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- count: false .panel1-sw8-auto[ ```r *gss ``` ] .panel2-sw8-auto[ ``` ## # A tibble: 500 × 11 ## year age sex college partyid hompop hours income class finrela weight ## <dbl> <dbl> <fct> <fct> <fct> <dbl> <dbl> <ord> <fct> <fct> <dbl> ## 1 2014 36 male degree ind 3 50 $2500… midd… below … 0.896 ## 2 1994 34 female no degree rep 4 31 $2000… work… below … 1.08 ## 3 1998 24 male degree ind 1 40 $2500… work… below … 0.550 ## 4 1996 42 male no degree ind 4 40 $2500… work… above … 1.09 ## 5 1994 31 male degree rep 2 40 $2500… midd… above … 1.08 ## 6 1996 32 female no degree rep 4 53 $2500… midd… average 1.09 ## 7 1990 48 female no degree dem 2 32 $2500… work… below … 1.06 ## 8 2016 36 female degree ind 1 20 $2500… midd… above … 0.478 ## 9 2000 30 female degree rep 5 40 $2500… midd… average 1.10 ## 10 1998 33 female no degree dem 2 40 $1500… work… far be… 0.550 ## # … with 490 more rows ``` ] --- count: false .panel1-sw8-auto[ ```r gss %>% * ggplot() ``` ] .panel2-sw8-auto[ ![](Slides-Week-12R_files/figure-html/sw8_auto_02_output-1.png)<!-- --> ] --- count: false .panel1-sw8-auto[ ```r gss %>% ggplot() + * aes(x = finrela, fill = college) ``` ] .panel2-sw8-auto[ ![](Slides-Week-12R_files/figure-html/sw8_auto_03_output-1.png)<!-- --> ] --- count: false .panel1-sw8-auto[ ```r gss %>% ggplot() + aes(x = finrela, fill = college) + * geom_bar(position = "fill") ``` ] .panel2-sw8-auto[ ![](Slides-Week-12R_files/figure-html/sw8_auto_04_output-1.png)<!-- --> ] --- count: false .panel1-sw8-auto[ ```r gss %>% ggplot() + aes(x = finrela, fill = college) + geom_bar(position = "fill") + * scale_fill_brewer(type = "qual") ``` ] .panel2-sw8-auto[ ![](Slides-Week-12R_files/figure-html/sw8_auto_05_output-1.png)<!-- --> ] --- count: false .panel1-sw8-auto[ ```r gss %>% ggplot() + aes(x = finrela, fill = college) + geom_bar(position = "fill") + scale_fill_brewer(type = "qual") + * theme(axis.text.x = element_text(angle = 45, * vjust = .5)) ``` ] .panel2-sw8-auto[ ![](Slides-Week-12R_files/figure-html/sw8_auto_06_output-1.png)<!-- --> ] --- count: false .panel1-sw8-auto[ ```r gss %>% ggplot() + aes(x = finrela, fill = college) + geom_bar(position = "fill") + scale_fill_brewer(type = "qual") + theme(axis.text.x = element_text(angle = 45, vjust = .5)) + * labs(x = "finrela: Self-Identification of Income Class", * y = "Proportion") ``` ] .panel2-sw8-auto[ ![](Slides-Week-12R_files/figure-html/sw8_auto_07_output-1.png)<!-- --> ] --- count: false .panel1-sw8-auto[ ```r gss %>% ggplot() + aes(x = finrela, fill = college) + geom_bar(position = "fill") + scale_fill_brewer(type = "qual") + theme(axis.text.x = element_text(angle = 45, vjust = .5)) + labs(x = "finrela: Self-Identification of Income Class", y = "Proportion") + * theme_minimal() ``` ] .panel2-sw8-auto[ ![](Slides-Week-12R_files/figure-html/sw8_auto_08_output-1.png)<!-- --> ] <style> .panel1-sw8-auto { color: white; width: 44.3333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-sw8-auto { color: white; width: 53.6666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-sw8-auto { color: white; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- - If there were no relationship, we would expect to see the purple bars reaching to the same height, regardless of income class. -- - Are the differences we see in the plot just due to random noise? -- - We can `generate` the null distribution in one of two ways - using randomization: approximates the null distribution by permuting the response and explanatory variables, so that each person's educational attainment is matched up with a random income from the sample in order to break up any association between the two -- - theory-based methods: the general approximation --- # Generate the null distribution using randomization count: false .panel1-sw9-auto[ ```r *gss ``` ] .panel2-sw9-auto[ ``` ## # A tibble: 500 × 11 ## year age sex college partyid hompop hours income class finrela weight ## <dbl> <dbl> <fct> <fct> <fct> <dbl> <dbl> <ord> <fct> <fct> <dbl> ## 1 2014 36 male degree ind 3 50 $2500… midd… below … 0.896 ## 2 1994 34 female no degree rep 4 31 $2000… work… below … 1.08 ## 3 1998 24 male degree ind 1 40 $2500… work… below … 0.550 ## 4 1996 42 male no degree ind 4 40 $2500… work… above … 1.09 ## 5 1994 31 male degree rep 2 40 $2500… midd… above … 1.08 ## 6 1996 32 female no degree rep 4 53 $2500… midd… average 1.09 ## 7 1990 48 female no degree dem 2 32 $2500… work… below … 1.06 ## 8 2016 36 female degree ind 1 20 $2500… midd… above … 0.478 ## 9 2000 30 female degree rep 5 40 $2500… midd… average 1.10 ## 10 1998 33 female no degree dem 2 40 $1500… work… far be… 0.550 ## # … with 490 more rows ``` ] --- count: false .panel1-sw9-auto[ ```r gss %>% * specify(college ~ finrela) ``` ] .panel2-sw9-auto[ ``` ## Response: college (factor) ## Explanatory: finrela (factor) ## # A tibble: 500 × 2 ## college finrela ## <fct> <fct> ## 1 degree below average ## 2 no degree below average ## 3 degree below average ## 4 no degree above average ## 5 degree above average ## 6 no degree average ## 7 no degree below average ## 8 degree above average ## 9 degree average ## 10 no degree far below average ## # … with 490 more rows ``` ] --- count: false .panel1-sw9-auto[ ```r gss %>% specify(college ~ finrela) %>% * hypothesize(null = "independence") ``` ] .panel2-sw9-auto[ ``` ## Response: college (factor) ## Explanatory: finrela (factor) ## Null Hypothesis: independence ## # A tibble: 500 × 2 ## college finrela ## <fct> <fct> ## 1 degree below average ## 2 no degree below average ## 3 degree below average ## 4 no degree above average ## 5 degree above average ## 6 no degree average ## 7 no degree below average ## 8 degree above average ## 9 degree average ## 10 no degree far below average ## # … with 490 more rows ``` ] --- count: false .panel1-sw9-auto[ ```r gss %>% specify(college ~ finrela) %>% hypothesize(null = "independence") %>% * generate(reps = 1000, type = "permute") ``` ] .panel2-sw9-auto[ ``` ## Response: college (factor) ## Explanatory: finrela (factor) ## Null Hypothesis: independence ## # A tibble: 500,000 × 3 ## # Groups: replicate [1,000] ## college finrela replicate ## <fct> <fct> <int> ## 1 degree below average 1 ## 2 no degree below average 1 ## 3 no degree below average 1 ## 4 degree above average 1 ## 5 no degree above average 1 ## 6 no degree average 1 ## 7 no degree below average 1 ## 8 no degree above average 1 ## 9 degree average 1 ## 10 no degree far below average 1 ## # … with 499,990 more rows ``` ] --- count: false .panel1-sw9-auto[ ```r gss %>% specify(college ~ finrela) %>% hypothesize(null = "independence") %>% generate(reps = 1000, type = "permute") %>% * calculate(stat = "Chisq") ``` ] .panel2-sw9-auto[ ``` ## Response: college (factor) ## Explanatory: finrela (factor) ## Null Hypothesis: independence ## # A tibble: 1,000 × 2 ## replicate stat ## <int> <dbl> ## 1 1 1.18 ## 2 2 4.25 ## 3 3 7.82 ## 4 4 4.80 ## 5 5 3.26 ## 6 6 0.516 ## 7 7 8.50 ## 8 8 6.58 ## 9 9 1.88 ## 10 10 0.858 ## # … with 990 more rows ``` ] <style> .panel1-sw9-auto { color: white; width: 44.3333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-sw9-auto { color: white; width: 53.6666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-sw9-auto { color: white; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- ```r null_dist_sim <- gss %>% specify(college ~ finrela) %>% hypothesize(null = "independence") %>% generate(reps = 1000, type = "permute") %>% calculate(stat = "Chisq") ``` --- # Generate the null distribution using theory based methods count: false .panel1-sw10-auto[ ```r *gss ``` ] .panel2-sw10-auto[ ``` ## # A tibble: 500 × 11 ## year age sex college partyid hompop hours income class finrela weight ## <dbl> <dbl> <fct> <fct> <fct> <dbl> <dbl> <ord> <fct> <fct> <dbl> ## 1 2014 36 male degree ind 3 50 $2500… midd… below … 0.896 ## 2 1994 34 female no degree rep 4 31 $2000… work… below … 1.08 ## 3 1998 24 male degree ind 1 40 $2500… work… below … 0.550 ## 4 1996 42 male no degree ind 4 40 $2500… work… above … 1.09 ## 5 1994 31 male degree rep 2 40 $2500… midd… above … 1.08 ## 6 1996 32 female no degree rep 4 53 $2500… midd… average 1.09 ## 7 1990 48 female no degree dem 2 32 $2500… work… below … 1.06 ## 8 2016 36 female degree ind 1 20 $2500… midd… above … 0.478 ## 9 2000 30 female degree rep 5 40 $2500… midd… average 1.10 ## 10 1998 33 female no degree dem 2 40 $1500… work… far be… 0.550 ## # … with 490 more rows ``` ] --- count: false .panel1-sw10-auto[ ```r gss %>% * specify(college ~ finrela) ``` ] .panel2-sw10-auto[ ``` ## Response: college (factor) ## Explanatory: finrela (factor) ## # A tibble: 500 × 2 ## college finrela ## <fct> <fct> ## 1 degree below average ## 2 no degree below average ## 3 degree below average ## 4 no degree above average ## 5 degree above average ## 6 no degree average ## 7 no degree below average ## 8 degree above average ## 9 degree average ## 10 no degree far below average ## # … with 490 more rows ``` ] --- count: false .panel1-sw10-auto[ ```r gss %>% specify(college ~ finrela) %>% * assume(distribution = "Chisq") ``` ] .panel2-sw10-auto[ ``` ## A Chi-squared distribution with 5 degrees of freedom. ``` ] <style> .panel1-sw10-auto { color: white; width: 44.3333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-sw10-auto { color: white; width: 53.6666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-sw10-auto { color: white; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- ```r null_dist_theory <- gss %>% specify(college ~ finrela) %>% assume(distribution = "Chisq") ``` --- # Visualize the randomized null distribution and test statistic count: false .panel1-sw11-auto[ ```r *null_dist_sim ``` ] .panel2-sw11-auto[ ``` ## Response: college (factor) ## Explanatory: finrela (factor) ## Null Hypothesis: independence ## # A tibble: 1,000 × 2 ## replicate stat ## <int> <dbl> ## 1 1 8.65 ## 2 2 3.54 ## 3 3 2.99 ## 4 4 20.2 ## 5 5 2.74 ## 6 6 6.43 ## 7 7 7.42 ## 8 8 3.58 ## 9 9 1.58 ## 10 10 1.69 ## # … with 990 more rows ``` ] --- count: false .panel1-sw11-auto[ ```r null_dist_sim %>% * visualize() ``` ] .panel2-sw11-auto[ ![](Slides-Week-12R_files/figure-html/sw11_auto_02_output-1.png)<!-- --> ] --- count: false .panel1-sw11-auto[ ```r null_dist_sim %>% visualize() + * shade_p_value(observed_indep_statistic, * direction = "greater") ``` ] .panel2-sw11-auto[ ![](Slides-Week-12R_files/figure-html/sw11_auto_03_output-1.png)<!-- --> ] <style> .panel1-sw11-auto { color: white; width: 44.3333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-sw11-auto { color: white; width: 53.6666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-sw11-auto { color: white; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # Visualize the *theoretical* null distribution and test statistic count: false .panel1-sw12-auto[ ```r *null_dist_sim ``` ] .panel2-sw12-auto[ ``` ## Response: college (factor) ## Explanatory: finrela (factor) ## Null Hypothesis: independence ## # A tibble: 1,000 × 2 ## replicate stat ## <int> <dbl> ## 1 1 8.65 ## 2 2 3.54 ## 3 3 2.99 ## 4 4 20.2 ## 5 5 2.74 ## 6 6 6.43 ## 7 7 7.42 ## 8 8 3.58 ## 9 9 1.58 ## 10 10 1.69 ## # … with 990 more rows ``` ] --- count: false .panel1-sw12-auto[ ```r null_dist_sim %>% * visualize() ``` ] .panel2-sw12-auto[ ![](Slides-Week-12R_files/figure-html/sw12_auto_02_output-1.png)<!-- --> ] --- count: false .panel1-sw12-auto[ ```r null_dist_sim %>% visualize() + * shade_p_value(observed_indep_statistic, * direction = "greater") ``` ] .panel2-sw12-auto[ ![](Slides-Week-12R_files/figure-html/sw12_auto_03_output-1.png)<!-- --> ] <style> .panel1-sw12-auto { color: white; width: 44.3333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-sw12-auto { color: white; width: 53.6666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-sw12-auto { color: white; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # Visualize both null distributions and the test statistic count: false .panel1-sw13-auto[ ```r *null_dist_sim ``` ] .panel2-sw13-auto[ ``` ## Response: college (factor) ## Explanatory: finrela (factor) ## Null Hypothesis: independence ## # A tibble: 1,000 × 2 ## replicate stat ## <int> <dbl> ## 1 1 8.65 ## 2 2 3.54 ## 3 3 2.99 ## 4 4 20.2 ## 5 5 2.74 ## 6 6 6.43 ## 7 7 7.42 ## 8 8 3.58 ## 9 9 1.58 ## 10 10 1.69 ## # … with 990 more rows ``` ] --- count: false .panel1-sw13-auto[ ```r null_dist_sim %>% * visualize(method = "both") ``` ] .panel2-sw13-auto[ ``` ## Warning: Check to make sure the conditions have been met for the theoretical ## method. {infer} currently does not check these for you. ``` ![](Slides-Week-12R_files/figure-html/sw13_auto_02_output-1.png)<!-- --> ] --- count: false .panel1-sw13-auto[ ```r null_dist_sim %>% visualize(method = "both") + * shade_p_value(observed_indep_statistic, * direction = "greater") ``` ] .panel2-sw13-auto[ ``` ## Warning: Check to make sure the conditions have been met for the theoretical ## method. {infer} currently does not check these for you. ``` ![](Slides-Week-12R_files/figure-html/sw13_auto_03_output-1.png)<!-- --> ] <style> .panel1-sw13-auto { color: white; width: 44.3333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-sw13-auto { color: white; width: 53.6666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-sw13-auto { color: white; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # Calculate the *p*-value from the observed statistic and null distribution count: false .panel1-sw14-auto[ ```r *null_dist_sim ``` ] .panel2-sw14-auto[ ``` ## Response: college (factor) ## Explanatory: finrela (factor) ## Null Hypothesis: independence ## # A tibble: 1,000 × 2 ## replicate stat ## <int> <dbl> ## 1 1 8.65 ## 2 2 3.54 ## 3 3 2.99 ## 4 4 20.2 ## 5 5 2.74 ## 6 6 6.43 ## 7 7 7.42 ## 8 8 3.58 ## 9 9 1.58 ## 10 10 1.69 ## # … with 990 more rows ``` ] --- count: false .panel1-sw14-auto[ ```r null_dist_sim %>% * get_p_value(obs_stat = observed_indep_statistic, * direction = "greater") ``` ] .panel2-sw14-auto[ ``` ## Warning: Please be cautious in reporting a p-value of 0. This result is an ## approximation based on the number of `reps` chosen in the `generate()` step. See ## `?get_p_value()` for more information. ``` ``` ## # A tibble: 1 × 1 ## p_value ## <dbl> ## 1 0 ``` ] <style> .panel1-sw14-auto { color: white; width: 44.3333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-sw14-auto { color: white; width: 53.6666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-sw14-auto { color: white; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- --- If there were really no relationship between education and income, our approximation of the probability that we would see a statistic as or more extreme than 30.6825231 is approximately 0 --- # Calculate the *p*-value using the true `\(\chi^2\)` distribution ``` ## X-squared ## 1.082094e-05 ``` -- ``` ## # A tibble: 1 × 3 ## statistic chisq_df p_value ## <dbl> <int> <dbl> ## 1 30.7 5 0.0000108 ``` --- Take a look at the self-identified income class of our survey respondents. Suppose our null hypothesis is that `finrela` follows a uniform distribution (i.e. there's actually an equal number of people that describe their income as far below average, below average, average, above average, far above average, or that don't know their income.) --- count: false .panel1-sw15-auto[ ```r *gss ``` ] .panel2-sw15-auto[ ``` ## # A tibble: 500 × 11 ## year age sex college partyid hompop hours income class finrela weight ## <dbl> <dbl> <fct> <fct> <fct> <dbl> <dbl> <ord> <fct> <fct> <dbl> ## 1 2014 36 male degree ind 3 50 $2500… midd… below … 0.896 ## 2 1994 34 female no degree rep 4 31 $2000… work… below … 1.08 ## 3 1998 24 male degree ind 1 40 $2500… work… below … 0.550 ## 4 1996 42 male no degree ind 4 40 $2500… work… above … 1.09 ## 5 1994 31 male degree rep 2 40 $2500… midd… above … 1.08 ## 6 1996 32 female no degree rep 4 53 $2500… midd… average 1.09 ## 7 1990 48 female no degree dem 2 32 $2500… work… below … 1.06 ## 8 2016 36 female degree ind 1 20 $2500… midd… above … 0.478 ## 9 2000 30 female degree rep 5 40 $2500… midd… average 1.10 ## 10 1998 33 female no degree dem 2 40 $1500… work… far be… 0.550 ## # … with 490 more rows ``` ] --- count: false .panel1-sw15-auto[ ```r gss %>% * ggplot2::ggplot() ``` ] .panel2-sw15-auto[ ![](Slides-Week-12R_files/figure-html/sw15_auto_02_output-1.png)<!-- --> ] --- count: false .panel1-sw15-auto[ ```r gss %>% ggplot2::ggplot() + * ggplot2::aes(x = finrela) ``` ] .panel2-sw15-auto[ ![](Slides-Week-12R_files/figure-html/sw15_auto_03_output-1.png)<!-- --> ] --- count: false .panel1-sw15-auto[ ```r gss %>% ggplot2::ggplot() + ggplot2::aes(x = finrela) + * ggplot2::geom_bar() ``` ] .panel2-sw15-auto[ ![](Slides-Week-12R_files/figure-html/sw15_auto_04_output-1.png)<!-- --> ] --- count: false .panel1-sw15-auto[ ```r gss %>% ggplot2::ggplot() + ggplot2::aes(x = finrela) + ggplot2::geom_bar() + * ggplot2::geom_hline(yintercept = 466.3, * col = "red") ``` ] .panel2-sw15-auto[ ![](Slides-Week-12R_files/figure-html/sw15_auto_05_output-1.png)<!-- --> ] --- count: false .panel1-sw15-auto[ ```r gss %>% ggplot2::ggplot() + ggplot2::aes(x = finrela) + ggplot2::geom_bar() + ggplot2::geom_hline(yintercept = 466.3, col = "red") + * ggplot2::labs(x = "finrela: Self-Identification of Income Class", * y = "Number of Responses") ``` ] .panel2-sw15-auto[ ![](Slides-Week-12R_files/figure-html/sw15_auto_06_output-1.png)<!-- --> ] --- count: false .panel1-sw15-auto[ ```r gss %>% ggplot2::ggplot() + ggplot2::aes(x = finrela) + ggplot2::geom_bar() + ggplot2::geom_hline(yintercept = 466.3, col = "red") + ggplot2::labs(x = "finrela: Self-Identification of Income Class", y = "Number of Responses") + * theme_minimal() ``` ] .panel2-sw15-auto[ ![](Slides-Week-12R_files/figure-html/sw15_auto_07_output-1.png)<!-- --> ] <style> .panel1-sw15-auto { color: white; width: 44.3333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-sw15-auto { color: white; width: 53.6666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-sw15-auto { color: white; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- It seems like a uniform distribution may not be the most appropriate description of the data--many more people describe their income as average than than any of the other options. Lets now test whether this difference in distributions is statistically significant. --- # Calculate the null distribution count: false .panel1-sw16-auto[ ```r *gss ``` ] .panel2-sw16-auto[ ``` ## # A tibble: 500 × 11 ## year age sex college partyid hompop hours income class finrela weight ## <dbl> <dbl> <fct> <fct> <fct> <dbl> <dbl> <ord> <fct> <fct> <dbl> ## 1 2014 36 male degree ind 3 50 $2500… midd… below … 0.896 ## 2 1994 34 female no degree rep 4 31 $2000… work… below … 1.08 ## 3 1998 24 male degree ind 1 40 $2500… work… below … 0.550 ## 4 1996 42 male no degree ind 4 40 $2500… work… above … 1.09 ## 5 1994 31 male degree rep 2 40 $2500… midd… above … 1.08 ## 6 1996 32 female no degree rep 4 53 $2500… midd… average 1.09 ## 7 1990 48 female no degree dem 2 32 $2500… work… below … 1.06 ## 8 2016 36 female degree ind 1 20 $2500… midd… above … 0.478 ## 9 2000 30 female degree rep 5 40 $2500… midd… average 1.10 ## 10 1998 33 female no degree dem 2 40 $1500… work… far be… 0.550 ## # … with 490 more rows ``` ] --- count: false .panel1-sw16-auto[ ```r gss %>% * specify(response = finrela) ``` ] .panel2-sw16-auto[ ``` ## Response: finrela (factor) ## # A tibble: 500 × 1 ## finrela ## <fct> ## 1 below average ## 2 below average ## 3 below average ## 4 above average ## 5 above average ## 6 average ## 7 below average ## 8 above average ## 9 average ## 10 far below average ## # … with 490 more rows ``` ] --- count: false .panel1-sw16-auto[ ```r gss %>% specify(response = finrela) %>% * hypothesize(null = "point", * p = c("far below average" = 1/6, * "below average" = 1/6, * "average" = 1/6, * "above average" = 1/6, * "far above average" = 1/6, * "DK" = 1/6)) ``` ] .panel2-sw16-auto[ ``` ## Response: finrela (factor) ## Null Hypothesis: point ## # A tibble: 500 × 1 ## finrela ## <fct> ## 1 below average ## 2 below average ## 3 below average ## 4 above average ## 5 above average ## 6 average ## 7 below average ## 8 above average ## 9 average ## 10 far below average ## # … with 490 more rows ``` ] --- count: false .panel1-sw16-auto[ ```r gss %>% specify(response = finrela) %>% hypothesize(null = "point", p = c("far below average" = 1/6, "below average" = 1/6, "average" = 1/6, "above average" = 1/6, "far above average" = 1/6, "DK" = 1/6)) %>% * calculate(stat = "Chisq") ``` ] .panel2-sw16-auto[ ``` ## Response: finrela (factor) ## Null Hypothesis: point ## # A tibble: 1 × 1 ## stat ## <dbl> ## 1 488. ``` ] <style> .panel1-sw16-auto { color: white; width: 44.3333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-sw16-auto { color: white; width: 53.6666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-sw16-auto { color: white; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- --- # Generate a null distribution assuming each income class is equally likely count: false .panel1-sw17-auto[ ```r *gss ``` ] .panel2-sw17-auto[ ``` ## # A tibble: 500 × 11 ## year age sex college partyid hompop hours income class finrela weight ## <dbl> <dbl> <fct> <fct> <fct> <dbl> <dbl> <ord> <fct> <fct> <dbl> ## 1 2014 36 male degree ind 3 50 $2500… midd… below … 0.896 ## 2 1994 34 female no degree rep 4 31 $2000… work… below … 1.08 ## 3 1998 24 male degree ind 1 40 $2500… work… below … 0.550 ## 4 1996 42 male no degree ind 4 40 $2500… work… above … 1.09 ## 5 1994 31 male degree rep 2 40 $2500… midd… above … 1.08 ## 6 1996 32 female no degree rep 4 53 $2500… midd… average 1.09 ## 7 1990 48 female no degree dem 2 32 $2500… work… below … 1.06 ## 8 2016 36 female degree ind 1 20 $2500… midd… above … 0.478 ## 9 2000 30 female degree rep 5 40 $2500… midd… average 1.10 ## 10 1998 33 female no degree dem 2 40 $1500… work… far be… 0.550 ## # … with 490 more rows ``` ] --- count: false .panel1-sw17-auto[ ```r gss %>% * specify(response = finrela) ``` ] .panel2-sw17-auto[ ``` ## Response: finrela (factor) ## # A tibble: 500 × 1 ## finrela ## <fct> ## 1 below average ## 2 below average ## 3 below average ## 4 above average ## 5 above average ## 6 average ## 7 below average ## 8 above average ## 9 average ## 10 far below average ## # … with 490 more rows ``` ] --- count: false .panel1-sw17-auto[ ```r gss %>% specify(response = finrela) %>% * hypothesize(null = "point", * p = c("far below average" = 1/6, * "below average" = 1/6, * "average" = 1/6, * "above average" = 1/6, * "far above average" = 1/6, * "DK" = 1/6)) ``` ] .panel2-sw17-auto[ ``` ## Response: finrela (factor) ## Null Hypothesis: point ## # A tibble: 500 × 1 ## finrela ## <fct> ## 1 below average ## 2 below average ## 3 below average ## 4 above average ## 5 above average ## 6 average ## 7 below average ## 8 above average ## 9 average ## 10 far below average ## # … with 490 more rows ``` ] --- count: false .panel1-sw17-auto[ ```r gss %>% specify(response = finrela) %>% hypothesize(null = "point", p = c("far below average" = 1/6, "below average" = 1/6, "average" = 1/6, "above average" = 1/6, "far above average" = 1/6, "DK" = 1/6)) %>% * generate(reps = 1000, type = "draw") ``` ] .panel2-sw17-auto[ ``` ## Response: finrela (factor) ## Null Hypothesis: point ## # A tibble: 500,000 × 2 ## # Groups: replicate [1,000] ## finrela replicate ## <fct> <fct> ## 1 average 1 ## 2 above average 1 ## 3 DK 1 ## 4 far below average 1 ## 5 above average 1 ## 6 far above average 1 ## 7 far above average 1 ## 8 above average 1 ## 9 below average 1 ## 10 above average 1 ## # … with 499,990 more rows ``` ] --- count: false .panel1-sw17-auto[ ```r gss %>% specify(response = finrela) %>% hypothesize(null = "point", p = c("far below average" = 1/6, "below average" = 1/6, "average" = 1/6, "above average" = 1/6, "far above average" = 1/6, "DK" = 1/6)) %>% generate(reps = 1000, type = "draw") %>% * calculate(stat = "Chisq") ``` ] .panel2-sw17-auto[ ``` ## Response: finrela (factor) ## Null Hypothesis: point ## # A tibble: 1,000 × 2 ## replicate stat ## <fct> <dbl> ## 1 1 8.51 ## 2 2 13.8 ## 3 3 4.55 ## 4 4 2.94 ## 5 5 9.35 ## 6 6 6.04 ## 7 7 5.94 ## 8 8 7.10 ## 9 9 2.73 ## 10 10 5.99 ## # … with 990 more rows ``` ] <style> .panel1-sw17-auto { color: white; width: 44.3333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-sw17-auto { color: white; width: 53.6666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-sw17-auto { color: white; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- --- # Visualize the null distribution and test statistic count: false .panel1-sw18-auto[ ```r *null_dist_gof ``` ] .panel2-sw18-auto[ ``` ## Response: finrela (factor) ## Null Hypothesis: point ## # A tibble: 1,000 × 2 ## replicate stat ## <fct> <dbl> ## 1 1 7.22 ## 2 2 6.71 ## 3 3 1.41 ## 4 4 11.7 ## 5 5 6.21 ## 6 6 8.63 ## 7 7 6.47 ## 8 8 5.8 ## 9 9 8.75 ## 10 10 7.65 ## # … with 990 more rows ``` ] --- count: false .panel1-sw18-auto[ ```r null_dist_gof %>% * visualize() ``` ] .panel2-sw18-auto[ ![](Slides-Week-12R_files/figure-html/sw18_auto_02_output-1.png)<!-- --> ] --- count: false .panel1-sw18-auto[ ```r null_dist_gof %>% visualize() + * shade_p_value(observed_gof_statistic, * direction = "greater") ``` ] .panel2-sw18-auto[ ![](Slides-Week-12R_files/figure-html/sw18_auto_03_output-1.png)<!-- --> ] <style> .panel1-sw18-auto { color: white; width: 44.3333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-sw18-auto { color: white; width: 53.6666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-sw18-auto { color: white; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- # Calculate the *p*-value count: false .panel1-sw19-auto[ ```r *null_dist_gof ``` ] .panel2-sw19-auto[ ``` ## Response: finrela (factor) ## Null Hypothesis: point ## # A tibble: 1,000 × 2 ## replicate stat ## <fct> <dbl> ## 1 1 7.22 ## 2 2 6.71 ## 3 3 1.41 ## 4 4 11.7 ## 5 5 6.21 ## 6 6 8.63 ## 7 7 6.47 ## 8 8 5.8 ## 9 9 8.75 ## 10 10 7.65 ## # … with 990 more rows ``` ] --- count: false .panel1-sw19-auto[ ```r null_dist_gof %>% * get_p_value(observed_gof_statistic, * direction = "greater") ``` ] .panel2-sw19-auto[ ``` ## Warning: Please be cautious in reporting a p-value of 0. This result is an ## approximation based on the number of `reps` chosen in the `generate()` step. See ## `?get_p_value()` for more information. ``` ``` ## # A tibble: 1 × 1 ## p_value ## <dbl> ## 1 0 ``` ] <style> .panel1-sw19-auto { color: white; width: 44.3333333333333%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel2-sw19-auto { color: white; width: 53.6666666666667%; hight: 32%; float: left; padding-left: 1%; font-size: 80% } .panel3-sw19-auto { color: white; width: NA%; hight: 33%; float: left; padding-left: 1%; font-size: 80% } </style> --- --- If each self-identified income class was equally likely to occur, our approximation of the probability that we would see a distribution like the one we did is approximately 0 --- # Calculate the *p*-value using the true `\(\chi^2\)` distribution ``` ## [1] 3.131231e-103 ``` --- --- ## Thats it!